Communication dynamics in finite capacity social networks 
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In communication networks structure and dynamics are tightly coupled. The structure controls 
the flow of information and is itself shaped by the dynamical process of information exchanged 
between nodes. In order to reconcile structure and dynamics, a generic model, based on the local 
interaction between nodes, is considered for the communication in large social networks. In agree- 
ment with data from a large human organization, we show that the flow is non-Markovian and 
controlled by the temporal limitations of individuals. We confirm the versatility of our model by 
predicting simultaneously the degree-dependent node activity, the balance between information in- 
put and output of nodes and the degree distribution. Finally, we quantify the limitations to network 
analysis when it is based on data sampled over a finite period of time. 



Limitations on the processing capacities of nodes and 
links have a profound impact on the flow of information 
in online communication networks [TJ [2] , the spreading 
of diseases in human encounter networks 31- and m so- 
cial networks [IHZ], where links between interacting in- 
dividuals can be highly volatile [8]. It is often assumed 
that communication takes place in an unrestrained way 
on a set of established connections, thereby neglecting, 
that structure and dynamics are interdependent. Here 
we consider the evolution of a network where links form 
as a result of non-Markovian interaction between nodes. 
In a time-limited environment, communication demands 
prioritization which is evident from the analysis of corre- 
spondence patterns [7113]. Hence, information flow on a 
network is a result of individuals' choices which are influ- 
enced by the state of surrounding nodes. In natural |10j 
and online [TTHT5] social networks, the nodes' activity is 
a non-trivial function of their degree. The activity level 
can be quantified by the number of social relationships si- 
multaneously maintained by an individual. This number 
has been suggested to reflect basic cognitive capabilities 
of primates [10] and humans [Til 031 US] ■ Here we model 
a network of individuals acting under time constraints 
and compare with a complete dataset of email communi- 
cation in a large organization. The model is discussed in 
the context of other communication networks. We pre- 
dict the information processing capacity of individuals as 
well as the structure of the network that they form. 

We use representative communication data from a 
large social organization, the University of Oslo. The 
data comprise a complete time-ordered list of 2.3 x 10 7 
emails between 5600 employees, 30 000 students and ap- 
proximately 10 6 people outside the organization over a 
period of three months (Sep-Nov 2010). The email con- 
tent was not recorded and identities of individuals were 
encrypted. We limit the influence of unsolicited bulk 
emails by disregarding those simultaneously sent to more 
than five recipients. However, the results are not sensi- 
tive to the filtering of bulk emails [16]. Previous work 
on email data has considered static network structures 
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FIG. 1. Weighted random, unweighted random and directed 
information flow. The error bars are estimated by bootstrap- 
ping. Inset: Similar plot using model data. The quantitative 
discrepancy between model and data results from the relative 
dominance of degree-one nodes in the empirical data. 



Results - We show that the communication is non- 
Markovian by comparing random and directed informa- 
tion flow: (i) Random flow is given by random walks 
on the network. The walker follows an empirical time- 
independent jump-probability = Nij/~Y^ k Nik from 
node i to node j. The sum is taken over all nodes and 
Nij is the number of emails sent from i to j during the 
timespan of the data, (ii) Directed flow is given by the 
chronological email exchange. Starting from a random 
node i, we wait for i to send an email, say to j. We then 
jump to j and wait for the next message j sends either 
back to i or to a new node k. Repeating this, we ob- 
tain a finite trajectory within the timespan of the data. 
The number of unique nodes visited by the directed and 
random flow as function of the number of jumps are com- 
pared by averaging over trajectories originating from all 
nodes (Fig. [I]). On average, directed flow visits relatively 
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fewer nodes than random flow, indicating a significant 
correlation between sent and received messages. 
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FIG. 2. Average number of messages sent per message re- 
ceived. Observational data is marked by "o". The solid line 
is a best fit by Eq. (J6|. The dotted lines mark the peak 
and the dashed diagonal line shows 8 = a. Inset: out-degree 
distribution for model and empirical data. The dashed line 
denotes the scale-break k s b — 250. Mean degree is 5.4 (Twit- 
ter data yields a mean degree of 8.8 and a similar exponent 
for the degree distribution [23]). Note the double-log scales. 

Our model requires nodes to perform a trade-off be- 
tween replying to others and initiating new conversations. 
Specifically, consider Af nodes, each initially connected to 
one other node. The nodes have a limited capacity and 
can send a maximum of N max messages in a timestep 
At = 1 day. The dynamics follows from three possible 
actions for a node i of out-degree kc 

(a) i processes received emails and if i has sent less than 
N max messages, any received email is replied to with a 
probability proportional to the sender's degree. Emails 
not replied to within At are subsequently deleted. In 
total, 82 replies are sent by this action. 

(b) If less than N max emails have been sent in (a), 
the remaining capacity N max — 82 is available for sending 
messages, called 81, to previously established contacts. 
The probability of sending a message to a contact is given 
by a constant r^. Hence, granted sufficient capacity on 
average rj„, • ki messages are initiated by i. Nodes with 
low ki will generally not reach their full capacity. 

(c) Nodes establish new contacts by sending requests 
with a probability r req . The probability that a request 
is sent to a node j is proportional to the degree of j, kj. 
A link is established between i and j, if j in the next 
timestep according to (a) replies to i. In reality, contacts 
might as well be established by face-to-face encounters, 
i.e. via channels not recorded explicitly in our data. 

The total number of messages 8 sent by a node in 



At is the sum 8 = 80 + 81 + 82- Analogously, mes- 
sages received by a node in the same timestep are termed 
a = ctQ + ax + a2- Nodes have an average lifetime r and 
are therefore removed from the network with a proba- 
bility At jr. For every node removed, a new node with 
a single random connection to an existing node is intro- 
duced, r is estimated to be 5.8 years from the known 
mean email user turnover time in the organization. The 
parameters r ini , r req and N max are determined below. 

According to (c), a link is established between i and 
j if one of the nodes sends a message to the other and 
receives a reply. The probability, Py, that a message is 
sent from i to j in At is proportional to kj , 
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where we in the approximation assume that ki <C ^i- 
According to (a), the mean number of requests that j 
receives during a timestep is proportional to r req and kj . 
The probability for j to reply to a request from nodes of 
degree k is proportional to (3kn(k), where /3 is a constant 
and n(k) is the number of nodes with degree k. The 
number of replies written by j is the product of Eq. 
and the integral over nodes 
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/3 k n(k) dk 
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(2) 



Since nodes reply to requests and therefore establish new 
links with a probability proportional to the sender degree, 
kn(k), the mean degree k c of a node's contacts is k c = 
j k 2 n(k)dk/ J kn(k)dk — (k 2 )/(k), a number generally 
larger than the mean degree (k) (Fig. [3]). 

Consequently the average degree-increase of nodes of 
degree k per timestep becomes r(k)At = 2f3r req kAt. The 
factor of 2 reflects the symmetry of sending and replying. 
The rate of losing links is inversely proportional to r, 
d = k/r. Hence, the net degree-growth rate becomes 
Ak/At = k ■ tq, where ro = (2f3r req — As long 

as a node has sufficient capacity to reply to all requests 
its degree increases approximately exponentially, k(t) ~ 
exp(r t). 

The degree distribution follows from the consideration 
that during At, a fraction of nodes n(k) of degree k 
changes their degree, ro [(fe — l)n(k — 1) — kn(k)], and a 
fraction 1/r is removed. A continuum-limit approxima- 
tion yields 



dn(k) 
dt 



-ro 



n(k) 



(3) 



The steady-state solution has the form n(k) = n(l) ■ fc~ 7 , 
where 7 = (1 — l/2(3r req T)~ 1 . The constant n(l) is fixed 
by integrating Eq. ^ over k and by demanding that the 
total number of nodes Af = J dk n(k) be constant. This 
yields n(l) = Af(j - 1). The condition < n(l) < Af 
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FIG. 3. Mean recipient degree as function of degree (□) and 
weighted by the number of messages sent to recipients (o). 
The horizontal line shows {k 2 )/{k} . The curves marked by "o" 
and "A" are analogous to the unweighted case but for half, 
respectively, one quarter of the observational period. Dashed 
lines show projection of nodes with two values of k for a vary- 
ing observation window. Note the double-log scale. 



bounds the power-law exponent: 1 < 7 < 2. The data 
yield 7 ~ 1.85 (Fig. [2] inset). 

So far we have assumed that nodes have infinite capac- 
ity. As a node's degree increases, it receives more mes- 
sages and this assumption becomes invalid. Consider the 
number of messages received by i per timestep. Contact 



requests from other nodes amount to cto = r req ki/(k) 
messages. The senders of these messages are drawn from 
a distribution n{k)/M. The probability for i to receive 
a message from its contacts is proportional to Ti n i and 
ki, hence a± = ri n i ■ k{. Analogously, as defined in (a), i 
issues So = r req requests to recipients distributed accord- 
ing to pi(k) (where pi{k) = k l n{k)/ J k' n(k')dk') due 
to the weighting of probabilities by the recipient degree. 
In the same timestep i sends Si — u\ messages to its 
contacts. Finally we consider back-and-forth communi- 
cation. For every message sent by % to j, a response is 
returned with a probability fiki (Eq. II]). In steady-state, 
the number of messages sent is identical for all timesteps 
and therefore i receives 



a 2 = (3h (5 + Si + S 2 ) 



(4) 



replies to messages sent in the previous timestep. 5 2 is 
the number of messages i sends in response to messages 
received from others which again is a sum over contribu- 
tions from the actions (a)-(c): 



S 2 = (a (k) po + a 1 (k) pi + a 2 (k) Pa2 ) 



(5) 



The terms on the right are respectively, requests from 
any node in the network (distributed as po), messages 



from existing contacts (distributed as pi), and back- 
and-forth messages (distributed as p a2 )- Each itera- 
tion of back-and-forth communication acts as a shift 
in the distribution of recipients relative to the distri- 
bution of senders T pi = Ppi+i- The distribution p a2 
accounts for all high-order shifts. To close the equa- 
tions for a 2 and S 2 , we use that the reply probability for 
each iteration is reduced by a factor f3 to approximate 
p a2 ~ p 2 . Inserting Eq. Q, a an d a i m Eq. ([5| yields 
S 2 = (3 {a (k) + a x k c + /3kik c (S a + 5 X )) / f{ki) where we 
introduce f(ki) = 1 — /3 2 kik c < 1. Summing over Sq, Si 
and S 2 we get 

Phi 

(6) 

Here the first three terms (referred to as 5<) are mes- 
sages sent to recipients selected according to p\ and with 
mean degree k c . The other terms, 5>, are messages to 
recipients distributed according to the higher order dis- 
tribution p 2 which has a mean k* = (k 3 }/(k 2 ) > k c and 
contribute significantly only for large ki. The mean of the 
weighted recipient degree (weighted by number of mes- 
sages received) is k™ ec = k c S < /S + k*S> /S, which departs 
from k c when <5> becomes appreciable (Fig. [3]). For low ki 
(ki = 1), the ratio of sent to received messages becomes 
S/a ~ [r req + r ini )/(r req /(k) + r ini ) > 1. Conversely, 
S/a — 1 when ki — (fc), hence an average node has a 
"balanced" email account. When ki becomes larger than 
(k) , i will increasingly receive requests and responses to 
its messages (Fig. |2| . 

The Dunbar number k]j is the degree where S reaches 
the capacity limit (S = N max ) and S/k is maximal. The 
scale break in the degree distribution (k s b ~ 250), Fig. [2] 
(inset), and ko — 230, Fig. |4j nearly coincide. In fact 
k s b is related to kjy because nodes beyond ko have a 
reduced probability to form new links. To determine k s b, 
consider the evolution of the nodes' degree in the limit 
where all capacity is used for replying, hence Si = 0. 
Using that <5 <ti S 2 , we get S rj S 2 = N max which in 
turn yields k sb = l3^ 1 N max f(k sb ) (r 

req \ ^"ini 

k c ) . k sb is 

found by solving this implicit equation, ko then follows 
from Eq. 

The parameters r„i = 0.023, r req — 0.13 and N max = 
12 are determined by the data in Fig. [2] From r req and 7 
we obtain f3 w 0.004. Larger N max increases the limit of 
5. r req is constrained by the offset at low a and r^i effects 
the skewness of the curve which follows from analysis of 
Eqs. Q and Fig. [4] shows the model prediction of 
S/ki and the corresponding email data. We complement 
our analysis with numerical computations. Using a large 
number of nodes, Af = 10, 000, we iterate actions (a)-(c) 
until steady-state is reached. While the mean-field pre- 
diction (Figs. [3] and [4]) is close to the numerical solution, 
some differences exist, e.g. at small fc, p(k) is not a strict 
power-law in the numerical solution due to the discrete- 
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FIG. 4. Average number of emails sent per link per day. Gray 
circles represent the average activity of all users of a certain 
out-degree and the red (blue) lines represent coarse grained 
mean (median) values in the real communication network; 
boxes mark upper and lower quartiles. Best fit with the model 
(simulation) is shown by the green lines (diamonds). At small 
k, f(k) ~ 1 (Eq. [6| and 5/k is a superposition of a term 
~ k due to the final quadratic term and a decaying term 
~ from the constant. At k > ko, nodes limited to N max 
messages per day, hence S/k ~ N ma x/k. 



we obtain Pij(dAt) = Pij(At) d . To produce the pro- 
jected curves in Fig. [3j Pij(dAt) is applied to both axes, 
k and k rec {k). Averaging w.r.t. all recipients j (dis- 
tributed as px), the projected sender out-degree becomes 
= fcj(l — Pfj) Pl - Similarly one can consider the pro- 
jection of the mean recipient degree leading to a similar 
reduction in the degree for finite-time data. For exam- 
ple, consider the data for the quarter period (d « 23) 
in Fig. [3} We have P l3 (At) d ~ (1 - r mi ) d and therefore 

fej /hi < 1/2 hence less than half the links persist. 

Concluding remarks - The finite capacity of agents in 
social networks induces an upper limit on the number 
of possible interactions [TTJ [L3T - TT5] . We propose a com- 
prehensive model that reconciles structure and dynamics 
of networks with finite capacity agents that dynamically 
form or lose links. In agreement with a complete set of 
email data and results from other social networks [L3ll23| . 
our model predicts a scale-free degree distribution up to 
a distinct scale-break induced by the capacity limit. Fur- 
ther, as agents gain importance in the network, the per- 
link-activity first increases with node-degree, peaks at 
intermediate degrees and declines at large degrees. The 
model and data therefore support the hypothesis of a 
general limit on the number (150-250) of active social 
relations that an individual can maintain |10| and is in 
agreement with empirical observations on social networks 

QUO!]. 
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ness of k. Further, the simulation gives a smooth peak in 
S/k (Fig. [4]) which is narrower than in the empirical data. 
This is due to slight overestimation of the repeated back- 
and-forth communication between well-connected nodes 
(k 1=3 200) relative to the data. We have also simulated 
the information flow (Fig. [I]) and achieve similar results. 
Finally, the average local clustering coefficient of the em- 
pirical and simulated networks is relatively small, ~ 0.04 
for both (similar clustering coefficient w 0.06 [53] and 
ko ~ 150 to 200 have been reported for other commu- 
nication networks [TT | H5 l 124]). We further checked the 
robustness of the model to variations [16] . 

Discussion - The data were recorded over three months 
and the communication network is therefore a finite- 
time projection of the real network. The projection re- 
duces the number of links. More active links will more 
likely persist through the projection than less active links. 
Fig. [3j shows the mean recipient degree k rec as function 
of the sender degree fcj for three observation time in- 
tervals. Consider again Eq. |6| and remember that re- 
cipients of the S < (<5>) messages are distributed as pi 
(pz). When observing only a single day, the proba- 
bility for an out-link between i to j not to be active 
is Pij(At) = 1 — S < k 3 /k c k l — 5 > kj/k*ki. For d days 
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